undetectable backdoor
Injecting Undetectable Backdoors in Obfuscated Neural Networks and Language Models
As ML models become increasingly complex and integral to high-stakes domains such as finance and healthcare, they also become more susceptible to sophisticated adversarial attacks. We investigate the threat posed by undetectable backdoors, as defined in Goldwasser et al. [2022], in models developed by insidious external expert firms. When such backdoors exist, they allow the designer of the model to sell information on how to slightly perturb their input to change the outcome of the model. We develop a general strategy to plant backdoors to obfuscated neural networks, that satisfy the security properties of the celebrated notion of indistinguishability obfuscation. Applying obfuscation before releasing neural networks is a strategy that is well motivated to protect sensitive information of the external expert firm. Our method to plant backdoors ensures that even if the weights and architecture of the obfuscated model are accessible, the existence ofthe backdoor is still undetectable. Finally, we introduce the notion of undetectable backdoors to language models and extend our neural network backdoor attacks to such models based on the existence of steganographic functions.
Planting Undetectable Backdoors in Machine Learning Models
Goldwasser, Shafi, Kim, Michael P., Vaikuntanathan, Vinod, Zamir, Or
Given the computational cost and technical expertise required to train machine learning models, users may delegate the task of learning to a service provider. We show how a malicious learner can plant an undetectable backdoor into a classifier. On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees. First, we show how to plant a backdoor in any model, using digital signature schemes. The construction guarantees that given black-box access to the original model and the backdoored version, it is computationally infeasible to find even a single input where they differ. This property implies that the backdoored model has generalization error comparable with the original model. Second, we demonstrate how to insert undetectable backdoors in models trained using the Random Fourier Features (RFF) learning paradigm or in Random ReLU networks. In this construction, undetectability holds against powerful white-box distinguishers: given a complete description of the network and the training data, no efficient distinguisher can guess whether the model is "clean" or contains a backdoor. Our construction of undetectable backdoors also sheds light on the related issue of robustness to adversarial examples. In particular, our construction can produce a classifier that is indistinguishable from an "adversarially robust" classifier, but where every input has an adversarial example! In summary, the existence of undetectable backdoors represent a significant theoretical roadblock to certifying adversarial robustness.
- North America > United States > Maryland > Baltimore (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- (11 more...)
Injecting Undetectable Backdoors in Deep Learning and Language Models
Kalavasis, Alkis, Karbasi, Amin, Oikonomou, Argyris, Sotiraki, Katerina, Velegkas, Grigoris, Zampetakis, Manolis
As ML models become increasingly complex and integral to high-stakes domains such as finance and healthcare, they also become more susceptible to sophisticated adversarial attacks. We investigate the threat posed by undetectable backdoors in models developed by insidious external expert firms. When such backdoors exist, they allow the designer of the model to sell information to the users on how to carefully perturb the least significant bits of their input to change the classification outcome to a favorable one. We develop a general strategy to plant a backdoor to neural networks while ensuring that even if the model's weights and architecture are accessible, the existence of the backdoor is still undetectable. To achieve this, we utilize techniques from cryptography such as cryptographic signatures and indistinguishability obfuscation. We further introduce the notion of undetectable backdoors to language models and extend our neural network backdoor attacks to such models based on the existence of steganographic functions.
- North America > Canada (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- (2 more...)
Machine learning has an alarming threat: undetectable backdoors
This article is part of our coverage of the latest in AI research. If an adversary gives you a machine learning model and secretly plants a malicious backdoor in it, what are the chances that you can discover it? The security of machine learning is becoming increasingly critical as ML models find their way into a growing number of applications. The new study focuses on the security threats of delegating the training and development of machine learning models to third parties and service providers. With the shortage of AI talent and resources, many organizations are outsourcing their machine learning work, using pre-trained models or online ML services.
Machine learning has a backdoor problem
This article is part of our coverage of the latest in AI research. If an adversary gives you a machine learning model and secretly plants a malicious backdoor in it, what are the chances that you can discover it? The security of machine learning is becoming increasingly critical as ML models find their way into a growing number of applications. The new study focuses on the security threats of delegating the training and development of machine learning models to third parties and service providers. With the shortage of AI talent and resources, many organizations are outsourcing their machine learning work, using pre-trained models or online ML services.
AIs could be hacked with undetectable backdoors to make bad decisions
Artificial intelligence is increasingly used in business. But because of the way it is built, there is theoretical potential for the software to contain undetectable features that bypass its normal decision-making process, meaning it could be exploited by malicious third parties. For instance, an AI model tasked with shortlisting CVs for a job vacancy could be made to covertly prioritise any which include a deliberately obscure phrase.
Machine-learning models vulnerable to undetectable backdoors
Boffins from UC Berkeley, MIT, and the Institute for Advanced Study in the United States have devised techniques to implant undetectable backdoors in machine learning (ML) models. Their work suggests ML models developed by third parties fundamentally cannot be trusted. In a paper that's currently being reviewed – "Planting Undetectable Backdoors in Machine Learning Models" – Shafi Goldwasser, Michael Kim, Vinod Vaikuntanathan, and Or Zamir explain how a malicious individual creating a machine learning classifier – an algorithm that classifies data into categories (eg "spam" or "not spam") – can subvert the classifier in a way that's not evident. "On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation," the paper explains. "Importantly, without the appropriate'backdoor key,' the mechanism is hidden and cannot be detected by any computationally-bounded observer."